Statistics and Applied Probability: A Tribute to Jeffrey J. HunterView this Special Issue
Research Article | Open Access
Alastair Scott, Chris Wild, "Methods for Stratified Cluster Sampling with Informative Stratification", Advances in Decision Sciences, vol. 2007, Article ID 056372, 12 pages, 2007. https://doi.org/10.1155/2007/56372
Methods for Stratified Cluster Sampling with Informative Stratification
We look at fitting regression models using data from stratified cluster samples when the strata may depend in some way on the observed responses within clusters. One important subclass of examples is that of family studies in genetic epidemiology, where the probability of selecting a family into the study depends on the incidence of disease within the family. We develop the survey-weighted estimating equation approach for this problem, with particular emphasis on the estimation of superpopulation parameters. Full maximum likelihood for this class of problems involves modelling the population distribution of the covariates which is simply not feasible when there are a large number of potential covariates. We discuss efficient semiparametric maximum likelihood methods in which the covariate distribution is left completely unspecified. We further discuss the relative efficiencies of these two approaches.
- J. M. Neuhaus, A. Scott, and C. Wild, “The analysis of retrospective family studies,” Biometrika, vol. 89, no. 1, pp. 23–37, 2002.
- J. M. Neuhaus, A. Scott, and C. Wild, “Family-specific approaches to the analysis of case-control family data,” Biometrics, vol. 62, no. 2, pp. 488–494, 2006.
- A. S. Whittemore, “Logistic regression of family data from case-control studies,” Biometrika, vol. 82, no. 1, pp. 57–67, 1995.
- L. P. Zhao, L. Hsu, S. Holte, Y. Chen, F. Quiaoit, and R. L. Prentice, “Combined association and aggregation analysis of data from case-control family studies,” Biometrika, vol. 85, no. 2, pp. 299–315, 1998.
- J. M. Neuhaus and N. P. Jewell, “The effect of retrospective sampling on binary regression models for clustered data,” Biometrics, vol. 46, no. 4, pp. 977–990, 1990.
- M. Wrensch, M. Lee, R. Miike et al., “Familial and personal medical history of cancer and nervous system conditions among adults with glioma and controls,” American Journal of Epidemiology, vol. 145, no. 7, pp. 581–593, 1997.
- D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3, pp. 581–592, 1976.
- A. J. Lee, L. McMurchy, and A. Scott, “Re-using data from case-control studies,” Statistics in Medicine, vol. 16, no. 12, pp. 1377–1389, 1997.
- D. A. Binder, “On the variances of asymptotically normal estimators from complex surveys,” International Statistical Review, vol. 51, no. 3, pp. 279–292, 1983.
- J. N. K. Rao, A. Scott, and C. J. Skinner, “Quasi-score tests with survey data,” Statistica Sinica, vol. 8, no. 4, pp. 1059–1070, 1998.
- D. DeMets and M. Halperin, “Estimation of a simple regression coefficient in samples arising from a sub sampling procedure,” Biometrics, vol. 33, no. 1, pp. 47–56, 1977.
- A. Scott and C. Wild, “Maximum likelihood for generalised case-control studies,” Journal of Statistical Planning and Inference, vol. 96, no. 1, pp. 3–27, 2001.
- C. Wild, “Fitting prospective regression models to case-control data,” Biometrika, vol. 78, no. 4, pp. 705–717, 1991.
- T. M. F. Smith and G. Nathan, “The effect of selection on regression analysis,” in Current Analysis of Complex Surveys, C. J. Skinner, D. Holt, and T. M. F. Smith, Eds., pp. 149–163, Wiley, New York, NY, USA, 1989.
- E. L. Lehmann, Asymptotic Theory, John Wiley & Sons, New York, NY, USA, 1999.
- J. Chen and J. N. K. Rao, “Asymptotic normality under two-phase sampling designs,” Statistica Sinica, vol. 17, no. 2, pp. 1047–1064, 2007.
- S. Amari and M. Kawanabe, “Estimating functions in semiparametric statistical models,” in Selected Proceedings of the Symposium on Estimating Functions (Athens, Ga, 1996), I. V. Basawa, V. P. Godambe, and R. L. Taylor, Eds., vol. 32 of IMS Lecture Notes Monograph Series, pp. 65–81, Institute of Mathematical Statistics, Hayward, Calif, USA, 1997.
- J. F. Lawless, J. D. Kalbfleisch, and C. Wild, “Semiparametric methods for response-selective and missing data problems in regression,” Journal of the Royal Statistical Society. Series B, vol. 61, no. 2, pp. 413–438, 1999.
- A. Scott and C. Wild, “The analysis of clustered case-control studies,” Journal of the Royal Statistical Society C, vol. 50, pp. 389–401, 2001.
- A. Scott and C. Wild, “On the robustness of weighted methods for fitting models to case-control data,” Journal of the Royal Statistical Society. Series B, vol. 64, no. 2, pp. 207–219, 2002.
- S. Cosslett, “Efficient estimation of discrete-choice models,” in Structural Analysis of Discrete Data with Econometric Applications, C. F. Manski and D. McFadden, Eds., pp. 51–111, Wiley, New York, NY, USA, 1981.
- A. J. Lee, “On the semi-parametric efficiency of the Scott-Wild estimator under choice-based and two-phase sampling,” to appear in Journal of Applied Mathematics and Decision Sciences.
- A. J. Lee and Y. Hirose, “Semi-parametric efficiency bounds for regression models under case-control sampling: the profile likelihood approach,” to appear in Annals of the Institute of Statistical Mathematics.
- W. K. Newey, “The asymptotic variance of semiparametric estimators,” Econometrica, vol. 62, no. 6, pp. 1349–1382, 1994.
- G. Nathan and D. Holt, “The effect of survey design on regression analysis,” Journal of the Royal Statistical Society. Series B, vol. 42, no. 3, pp. 377–386, 1980.
- A. M. Krieger and D. Pfeffermann, “Maximum likelihood estimation from complex sample surveys,” Survey Methodology, vol. 18, pp. 225–239, 1992.
- D. Pfeffermann and M. Sverchkov, “Parametric and semi-parametric estimation of regression models fitted to survey data,” Sankhya B, vol. 61, no. 1, pp. 166–186, 1999.
- J. M. Robins, A. Rotnitzky, and L. P. Zhao, “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American Statistical Association, vol. 89, no. 427, pp. 846–866, 1994.
- J. M. Robins, F. S. Hsieh, and W. Newey, “Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates,” Journal of the Royal Statistical Society. Series B, vol. 57, no. 2, pp. 409–424, 1995.
Copyright © 2007 Alastair Scott and Chris Wild. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.