Computational and Mathematical Methods in Medicine

Volume 2015 (2015), Article ID 801031, 11 pages

http://dx.doi.org/10.1155/2015/801031

## The Covariance Adjustment Approaches for Combining Incomparable Cox Regressions Caused by Unbalanced Covariates Adjustment: A Multivariate Meta-Analysis Study

^{1}Department of Biostatistics, Faculty of Medicine, Shiraz University of Medical Sciences, P.O. Box 71345-1874, Shiraz, Iran^{2}Department of Biostatistics, Infertility Research Center, Shiraz University of Medical Sciences, P.O. Box 71345-1874, Shiraz, Iran

Received 21 May 2015; Revised 28 July 2015; Accepted 6 August 2015

Academic Editor: Qi Dai

Copyright © 2015 Tania Dehesh et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

*Background*. Univariate meta-analysis (UM) procedure, as a technique that provides a single overall result, has become increasingly popular. Neglecting the existence of other concomitant covariates in the models leads to loss of treatment efficiency. Our aim was
proposing four new approximation approaches for the covariance matrix of the coefficients, which is not readily available for the multivariate generalized least square (MGLS) method as a multivariate meta-analysis approach.* Methods*. We evaluated the efficiency of four new approaches including zero correlation (ZC), common correlation (CC), estimated correlation (EC), and multivariate multilevel correlation (MMC) on the estimation bias, mean square error (MSE), and 95% probability coverage of the confidence interval (CI) in the synthesis of Cox proportional hazard models coefficients in a simulation study.* Result*. Comparing the results of the simulation study on the MSE, bias, and CI of the estimated coefficients indicated that MMC approach was the most accurate procedure compared to EC, CC, and ZC procedures. The precision ranking of the four approaches according to all above settings was MMC ≥ EC ≥ CC ≥ ZC.* Conclusion*. This study highlights advantages of MGLS meta-analysis on UM approach. The results suggested the use of MMC procedure to overcome the lack of
information for having a complete covariance matrix of the coefficients.

#### 1. Introduction

Meta-analysis is widely accepted as a systematic combination procedure using available evidence of independent studies for the purpose of a single overall result for the treatment of interest. This statistical procedure is becoming increasingly popular with medical researchers, particularly in clinical trials and survival analysis [1, 2]. Survival analysis is performed when the time is as the outcome variable and the Cox proportional hazard model is a famous procedure in this field [3]. Meta-analysis attempts to achieve a comprehensive result due to two procedures based on data availability: individual patient data (IPD) meta-analysis or aggregate patient data (APD) meta-analysis. It means that if individual’s data from different studies are available, IPD is preferred, but if it is impossible to assemble the individual’s raw data, then the APD should be chosen for combining the results of different studies. An IPD procedure requires individual data collection, whereas APD is based on summarized data that is extracted from published reports. While APD results are not as desirable as IPD, the majority of meta-analysis work relies on APD due to time availability and financial restriction [4–7].

Selection of homogeneous and related studies that address the same question is the first difficult task in each APD meta-analysis and has been debated by many authors [4, 8]. An additional problem is the fact that primary studies have diverse methodology and design protocols, such as differences in sample size and different covariate adjustment [9]. Of course, different covariate adjustment in different studies is a troublesome issue in any meta-analysis. This mis djustment of covariates may occur because of a decision by the analyst. Two suggested ways that have been used to date are (i) restricting the analysis to those models that have exactly the same set of covariates or (ii) using all available studies but omitting some important covariates. Both suggested ways lead to a waste of integrating information. As we know, omitting important concomitant covariates in nonlinear models leads to estimation bias and misleading interpretation, while the power is also decreased for no treatment effect detection even in primary studies [10–12]. In the Cox model, this covariates’ omission leads to estimation bias toward zero for treatment effects [13, 14].

Nowadays, in APD, almost all researchers attempt to achieve the result for the desired variable by UM and neglect the existence of other covariates in the model. This could be problematic in meta-analysis. Several researchers have encountered this problem. With the exception of the limited number of studies that propose synthesis slopes in linear regression models [15], the others did not propose applicable solution methods, especially in nonlinear model fields. To combine incomparable Cox regression models, Yuan and Anderson proposed two approaches in 2010 that use the concomitant covariates in combining incomparable Cox models but in a UM manner and focused on a single interested covariate rather than estimating the influences of other concomitant covariates [16].

A multivariate meta-analysis approach can be one of the best solutions to overcome the problem of combining incomparable models by accounting for existences of all covariates in a meta-analysis simultaneously [17]. By using this approach in meta-analysis, no covariate or information is omitted, which leads to a more complete model with the estimated influences of all covariates to date. In addition to saving information on all covariates, which had previously used significant financial collection expenses in the primary studies, a complete model can be helpful in taking more confident therapeutic decisions. A useful and simple procedure in the multivariate meta-analysis is a generalized least square (GLS) estimation method. However, performing this method as a meta-analysis procedure has some difficulties due to lack of key information required [15, 17].

The main objective of this study is first to apply a MLGS method of synthesis of incomparable Cox regression slopes in order to have a complete Cox proportional hazard model with the effects of all available covariates and then to propose four new approaches to approximate the covariance matrix of coefficients as a necessary part of MGLS performance. We evaluated these four new proposed approaches by some statistical features. This paper also provides a general insight into the advantages of MGLS estimation over routine UM procedures that are usually used in meta-analysis.

#### 2. Materials and Methods

##### 2.1. Review of Conventional Univariate Meta-Analysis

The popular and used univariate meta-analysis approaches are based on weighted mean estimator described by A. Whitehead and J. Whitehead [18] and DerSimonian and Laird [1]. In a meta-analysis, we encounter two approaches, fixed effect and random effect analysis. If all the studies are assumed to have the same common treatment effects and differences between the studies are assumed to be due to chance, the fixed effect procedure is suitable, but the final result cannot be generalized to the population [19]. In the random effect modeling approach, the overall study variations are divided in to two parts: between-study variation represented by a random term () and within-study variation represented by () in the model. The results of random effect meta-analysis can be generalized to the population. The random term () is assumed to have a normal distribution with mean zero and unknown variance. is called measurement error and has a normal distribution with zero mean and known variance . Hence, if we denote by the treatment effect or estimated log hazard ratio of the th study and is the true overall effect, then a simple model is written as (fixed effects model) and (random effects model). We assume that is approximately distributed as .

In conventional weighted mean approaches, using the reciprocal of variance estimation as the weight, the total is estimated as with , where is the weight given to study .

An approximate of under the fixed effects assumption is and, under the random effects, reciprocal of between-study variances is added to the weights: . Thus, studies with smaller variance are given more weight than those with large variances. Attribution of any weight to individual study is permitted, but since this weight is reported to provide the highest precision for total treatment effect [20], almost all meta-analysis researchers have used it.

The famous statistics to test the homogeneity of treatment effect across all studies is Cochran’ statistic [21], which is constructed based on differences between the pooled estimate effect and each study effect.

Under the null hypothesis of homogeneity of treatment effect across all studies, , follows distribution with () degrees of freedom. test is reported to have a low power, especially when the number of primary studies is small [22].

Another measurement of heterogeneity is provided by referring to index as follows: . The percentage of total variability due to between-study variability is interpreted by index. Recently, has become more well known in meta-analysis, because it indicates the percentage magnitude of heterogeneity [23].

##### 2.2. MGLS in Meta-Analysis with Cox Models

Recently, the synthesis of regression slopes has received more attention in meta-analysis [24]. We illustrate synthesis of the incomparable Cox regression model slopes based on MGLS approach, presented by Becker and Wu previously [15].

Suppose that we have covariates in whole studies that participate in meta-analysis and suppose that is the interested variable that existed in studies and the other covariates exist only for adjustment. It means that each study has some of the covariates, not all of them. Furthermore, there is no treatment by covariate interaction. A full adjustment Cox model is as follows:For convenience, we illustrated MGLS estimation method in combining incomparable Cox models. Suppose that we want to combine the results of three studies of Cox models in MGLS approach and we have four covariates besides which is the interested variable in whole studies. In the first study, we have the coefficients of these covariates , in the second study , and in the last one . For each covariate, an indicator variable was defined that takes the value of 1 if those covariates exist in the study and zero otherwise. Here we assume coefficients as responses and construct a multivariate format. In fact, we want to estimate the influence of all covariates, not the hazards in different times, so the multivariate approach is acceptable. The MGLS for the above example can be shown in a matrix form as follows:In above matrix form, is the coefficient of covariate in study (), . An alternative writing of the above model is , where is a vector of all covariate coefficients from entire studies, is a matrix that contains 1 and 0 in each row representing covariate existence in each study and the columns contain all covariates (here covariates) in meta-analysis, is a vector of sampling errors, and is a vector of random effects that is computed from between-coefficient variability. As we know, the best linear unbiased estimator of with MGLS procedure is and the covariance matrix of with a large sample is asymptotically normally distributed: (). If we consider the th diagonal element of as and if is a multivariate normal, then the confidence interval and hypothesis test for each is available: , where is the upper tail critical value of standard normal distribution.

A homogeneity test of all coefficients across studies under the normality assumption for is given as follows.

which has a large sample with () degrees of freedom, where and are the number of studies and covariates in all studies, respectively.

As we can observe clearly, all the estimations depend on the blockwise diagonal covariance matrix of coefficients (). Without having a complete coefficients covariance matrix () or a suitable estimated coefficients covariance matrix (), all MGLS estimates have computation problems. For instance, the covariance matrix of coefficients in the above example is as follows:, , and are three covariate coefficient vectors of three primary studies. The major limitation and problem that has been presented previously is lack of actual complete coefficients covariance matrixes from primary studies [17, 19]. The multivariate coefficients covariance matrix is a blockwise diagonal that includes the variance of covariate coefficients on its diagonal, which can almost always be found in the Cox model results and between-coefficients covariances on off-diagonal parts which are rarely reported even in recently published papers.

We attempted to propose approximations for the covariance of covariate coefficients and construct a covariance matrix as close as possible to the actual to have the MGLS coefficients estimates finally.

##### 2.3. Four New Approaches

Suppose that, in one of the primary Cox models, we have coefficients and . From basic statistical laws, the covariance of these two coefficients can be obtained by; this can be generalized to all coefficients in whole studies:, . So if we have a correlation value between each paired coefficients, the covariance can be calculated simply. Therefore, we propose four approaches for the correlation calculation to approximate coefficients covariances, which is one of the main purposes of this study.

###### 2.3.1. Zero Correlation (ZC)

We can assume that the authors during primary studies reported a very qualified model in the initial studies that completely checked for lack of multicollinearity. We can ignore the correlations and take them as zero in (4), so matrixes are a diagonal that has only actual available variance of the coefficient on its diagonal.

###### 2.3.2. Common Correlation (CC)

Lack of multicollinearity is rather unlikely, even when considered optimistic. In a very qualified model, a little multicollinearity is unavoidable. Therefore, we can take a little common correlation value among all coefficients in all studies. For example, we can assume. This assumption does not have any influence in calculation of , because it is a common value for all coefficients. By substitution of this common value in (4), all covariances can be calculated.

###### 2.3.3. Estimated Correlation (EC)

In this approach, we can extract all similar coefficients from all studies that participated in meta-analysis. After extracting, we must put similar coefficients in the same vectors . Therefore, we have some vectors, but with different lengths, because some of the covariates may participate in fewer studies than others. Then, the correlation between these vectors can be used as the correlations part in (4). The benefit of this approach is the fact that we use completely real available information in correlation computation instead of zero or common values based on educated guesses that have been used in two previous approaches. This approach also has a drawback and limitation; it is useful only in those meta-analyses that have many primary studies. The reason is described in the following paragraph.

One important point that should be paid attention to is that we must extract the covariate coefficients that has similar concomitant coefficients along themselves in the same study. For example, if there are two studies with and , these two coefficients cannot be in the same vector because they have different concomitant covariate coefficients with each other that have influence on their values. For a more detailed illustration, assume that we have only eight primary studies in a meta-analysis where exists in all of them as a desired variable and the other four covariates (2, 3, 4, and 5) participated randomly in each of the models as follows: *Study 1*: . *Study 2*: . *Study 3*: . *Study 4*: . *Study 5*: . *Study 6*: . *Study 7*: . *Study 8*: .In the above example, we have only 3 vectors, , , and , the elements of which come from the second, fifth and eighth studies, so the three correlations coefficients are computable. As we can observe, the other coefficients in other studies have different concomitant coefficients with themselves; therefore, they cannot be in the same vectors and cannot have correlation. In fact, in this example, we have five coefficients in whole 8 studies , but we can calculate the correlations, only between three of them . For the other coefficients, we take their correlation values as zero. Logically, when we have more studies, this problem does not occur and we can obtain vectors with longer lengths for all coefficients and therefore all correlations are calculable.

###### 2.3.4. Multivariate Multilevel Correlation (MMC)

The final suggestion is to look at the studies and covariate coefficients as a multivariate multilevel model. Goldestein has explained that multivariate response data are conveniently incorporated into multilevel models by creating an extra level below the original level 1 to define multivariate structure. There are several interesting features of this model. This model does not have level 1 variability because level 1 exists only to define multivariate structure. Level 2 variances and covariance are the between-studies variation. Another important feature is the fact that the multivariate multilevel estimates are statistically efficient even where some responses are missing in meta-analysis of some studies that do not have some of the coefficients [25]. We have two levels: covariates coefficients as level 1 are nested in studies as level 2. Each response was formulated as follows:where is the index for the study and is for covariates in all studies and is the random term of the responses. We have a covariance and a correlation matrix for the random part between all covariates. Each response or each coefficient was formulated separately. For example, for two coefficients, the formulas are as follows:So these random parts are as follows:From this covariance matrix, the correlations between coefficients can be calculated and substituted in (4) and then matrix obtained finally.

MLwiN 2.3 is software for doing multivariate multilevel analysis that is linked to R software recently and all above calculations can be done using this software.

##### 2.4. Mean Absolute Percentage Error (MAPE)

Several statistics for model checking are available, but when we have lack of sufficient information, for example, in a meta-analysis, MAPE can be a suitable choice.

The coverage probability of 95% of Wald (W) and Bonferroni (B) CI was also calculated as another evidence for comparing the efficiencies of the four new approaches. The number of time that all 95% BCI cover real coefficients values of all coefficients simultaneously among 2000 simulations were also calculated for each of the four approaches. The MAPE and WCI and BCI formulas are presented in the Appendix.

##### 2.5. Simulation Studies

We explore some statistical properties of four new approaches in terms of MSE, estimation bias, the amount of reduction in MAPE, and the coverage probability of 95% WCI and BCI in R software. We simulated survival times as the first required part in a simulation of the Cox model based on the procedure described by Blender and his coworkers in 2011 and Austin in 2010 [26, 27]. We generated a Cox model with five covariates similar to that observed in the male breast cancer clinical trials, as an example of a rare cancer for which we had survival data on. Our simulation design for obtaining Cox beta coefficients followed the procedure used by Yuan and Anderson in 2010 [16]. We assumed and generated Cox models with five covariates. To simulate covariates similar to those observed in a male breast cancer, was generated from a Bernoulli distribution with to represent a treatment indicator, was generated as a covariate of centered age from a normal distribution with mean 0 and variance 100, was generated as a covariate of tumor size of distribution with four degrees of freedom, and was generated from a Bernoulli variable with as an indicator of auxiliary lymph node involvement. stands for the number of lymph nodes involved with male breast cancer, generated from an exponential distribution with the rate of 1/4, rounded up to an integer and a modification applied to them because a negative nodal status () would occur in roughly 40% of patients. To generate the survival times, the values coefficients of to were set tobased on real values similar to those observed in male breast cancer patients. exists in all studies, but other covariates were chosen without replacement from to . The survival times were randomly censored with probability 0.1. The baseline failure time is generated from an exponential distribution with . The number of primary studies was set in turn to 20, 25, 30, 35, 40, and 45. The number of patients in each study was randomly picked up from 100 to 500 and survival times for each study were censored with probability randomly chosen between 0.1 and 0.4.

After extracting covariates coefficients from different simulated Cox studies, matrix was constructed by arranging all from different studies under each other.

matrix was constructed based on the four proposed methods and substituted in the MGLS estimation formula. If the heterogeneity of studies was rejected, then the variance between each pair of coefficients (as a random parts of the model) is added to the diagonal elements of matrix. Then the final covariate coefficient was estimated from the MGLS procedure. We generated 2000 random data sets for each simulation scenario and all statistical settings are the average of these 2000 simulations. The multivariate multilevel covariance matrixs were calculated by the R2MLwiN package in R software, like all other simulation procedures.

#### 3. Results

Table 1 shows the bias, standard deviation (SD), and MSE of four proposed methods under different number of studies ( from 20 to 45), each for 2000 simulations. The result of ZC method is similar to the traditional weighted mean meta-analysis that is used routinely in meta-analysis work, especially for that exists in all studies.