About this Journal Submit a Manuscript Table of Contents
Journal of Biomedicine and Biotechnology
Volume 2005 (2005), Issue 2, Pages 113-123
Research article

Combining Information From Multiple Data Sources to Create Multivariable Risk Models: Illustration and Preliminary Assessment of a New Method

1Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, USA
2Center for Clinical Health Policy Research, Duke University Medical Center, Durham 27705, NC, USA
3BioSignia, Inc, 1822 East NC Highway 54, Durham 27713, NC, USA

Received 5 February 2004; Revised 29 March 2004; Accepted 6 April 2004

Copyright © 2005 Hindawi Publishing Corporation. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


A common practice of metanalysis is combining the results of numerous studies on the effects of a risk factor on a disease outcome. If several of these composite relative risks are estimated from the medical literature for a specific disease, they cannot be combined in a multivariate risk model, as is often done in individual studies, because methods are not available to overcome the issues of risk factor colinearity and heterogeneity of the different cohorts. We propose a solution to these problems for general linear regression of continuous outcomes using a simple example of combining two independent variables from two sources in estimating a joint outcome. We demonstrate that when explicitly modifying the underlying data characteristics (correlation coefficients, standard deviations, and univariate betas) over a wide range, the predicted outcomes remain reasonable estimates of empirically derived outcomes (gold standard). This method shows the most promise in situations where the primary interest is in generating predicted values as when identifying a high-risk group of individuals. The resulting partial regression coefficients are less robust than the predicted values.