Advances in Decision Sciences

Advances in Decision Sciences / 2007 / Article
Special Issue

Statistics and Applied Probability: A Tribute to Jeffrey J. Hunter

View this Special Issue

Research Article | Open Access

Volume 2007 |Article ID 056372 | https://doi.org/10.1155/2007/56372

Alastair Scott, Chris Wild, "Methods for Stratified Cluster Sampling with Informative Stratification", Advances in Decision Sciences, vol. 2007, Article ID 056372, 12 pages, 2007. https://doi.org/10.1155/2007/56372

Methods for Stratified Cluster Sampling with Informative Stratification

Academic Editor: Paul Cowpertwait
Received24 Apr 2007
Accepted08 Aug 2007
Published16 Oct 2007

Abstract

We look at fitting regression models using data from stratified cluster samples when the strata may depend in some way on the observed responses within clusters. One important subclass of examples is that of family studies in genetic epidemiology, where the probability of selecting a family into the study depends on the incidence of disease within the family. We develop the survey-weighted estimating equation approach for this problem, with particular emphasis on the estimation of superpopulation parameters. Full maximum likelihood for this class of problems involves modelling the population distribution of the covariates which is simply not feasible when there are a large number of potential covariates. We discuss efficient semiparametric maximum likelihood methods in which the covariate distribution is left completely unspecified. We further discuss the relative efficiencies of these two approaches.

References

  1. J. M. Neuhaus, A. Scott, and C. Wild, “The analysis of retrospective family studies,” Biometrika, vol. 89, no. 1, pp. 23–37, 2002. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  2. J. M. Neuhaus, A. Scott, and C. Wild, “Family-specific approaches to the analysis of case-control family data,” Biometrics, vol. 62, no. 2, pp. 488–494, 2006. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  3. A. S. Whittemore, “Logistic regression of family data from case-control studies,” Biometrika, vol. 82, no. 1, pp. 57–67, 1995. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  4. L. P. Zhao, L. Hsu, S. Holte, Y. Chen, F. Quiaoit, and R. L. Prentice, “Combined association and aggregation analysis of data from case-control family studies,” Biometrika, vol. 85, no. 2, pp. 299–315, 1998. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  5. J. M. Neuhaus and N. P. Jewell, “The effect of retrospective sampling on binary regression models for clustered data,” Biometrics, vol. 46, no. 4, pp. 977–990, 1990. View at: Publisher Site | Google Scholar
  6. M. Wrensch, M. Lee, R. Miike et al., “Familial and personal medical history of cancer and nervous system conditions among adults with glioma and controls,” American Journal of Epidemiology, vol. 145, no. 7, pp. 581–593, 1997. View at: Google Scholar
  7. D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3, pp. 581–592, 1976. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  8. A. J. Lee, L. McMurchy, and A. Scott, “Re-using data from case-control studies,” Statistics in Medicine, vol. 16, no. 12, pp. 1377–1389, 1997. View at: Publisher Site | Google Scholar
  9. D. A. Binder, “On the variances of asymptotically normal estimators from complex surveys,” International Statistical Review, vol. 51, no. 3, pp. 279–292, 1983. View at: Google Scholar | MathSciNet
  10. J. N. K. Rao, A. Scott, and C. J. Skinner, “Quasi-score tests with survey data,” Statistica Sinica, vol. 8, no. 4, pp. 1059–1070, 1998. View at: Google Scholar | Zentralblatt MATH
  11. D. DeMets and M. Halperin, “Estimation of a simple regression coefficient in samples arising from a sub sampling procedure,” Biometrics, vol. 33, no. 1, pp. 47–56, 1977. View at: Publisher Site | Google Scholar | Zentralblatt MATH
  12. A. Scott and C. Wild, “Maximum likelihood for generalised case-control studies,” Journal of Statistical Planning and Inference, vol. 96, no. 1, pp. 3–27, 2001. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  13. C. Wild, “Fitting prospective regression models to case-control data,” Biometrika, vol. 78, no. 4, pp. 705–717, 1991. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  14. T. M. F. Smith and G. Nathan, “The effect of selection on regression analysis,” in Current Analysis of Complex Surveys, C. J. Skinner, D. Holt, and T. M. F. Smith, Eds., pp. 149–163, Wiley, New York, NY, USA, 1989. View at: Google Scholar
  15. E. L. Lehmann, Asymptotic Theory, John Wiley & Sons, New York, NY, USA, 1999.
  16. J. Chen and J. N. K. Rao, “Asymptotic normality under two-phase sampling designs,” Statistica Sinica, vol. 17, no. 2, pp. 1047–1064, 2007. View at: Google Scholar
  17. S. Amari and M. Kawanabe, “Estimating functions in semiparametric statistical models,” in Selected Proceedings of the Symposium on Estimating Functions (Athens, Ga, 1996), I. V. Basawa, V. P. Godambe, and R. L. Taylor, Eds., vol. 32 of IMS Lecture Notes Monograph Series, pp. 65–81, Institute of Mathematical Statistics, Hayward, Calif, USA, 1997. View at: Google Scholar | Zentralblatt MATH | MathSciNet
  18. J. F. Lawless, J. D. Kalbfleisch, and C. Wild, “Semiparametric methods for response-selective and missing data problems in regression,” Journal of the Royal Statistical Society. Series B, vol. 61, no. 2, pp. 413–438, 1999. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  19. A. Scott and C. Wild, “The analysis of clustered case-control studies,” Journal of the Royal Statistical Society C, vol. 50, pp. 389–401, 2001. View at: Google Scholar
  20. A. Scott and C. Wild, “On the robustness of weighted methods for fitting models to case-control data,” Journal of the Royal Statistical Society. Series B, vol. 64, no. 2, pp. 207–219, 2002. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  21. S. Cosslett, “Efficient estimation of discrete-choice models,” in Structural Analysis of Discrete Data with Econometric Applications, C. F. Manski and D. McFadden, Eds., pp. 51–111, Wiley, New York, NY, USA, 1981. View at: Google Scholar | Zentralblatt MATH
  22. A. J. Lee, “On the semi-parametric efficiency of the Scott-Wild estimator under choice-based and two-phase sampling,” to appear in Journal of Applied Mathematics and Decision Sciences. View at: Google Scholar
  23. A. J. Lee and Y. Hirose, “Semi-parametric efficiency bounds for regression models under case-control sampling: the profile likelihood approach,” to appear in Annals of the Institute of Statistical Mathematics. View at: Google Scholar
  24. W. K. Newey, “The asymptotic variance of semiparametric estimators,” Econometrica, vol. 62, no. 6, pp. 1349–1382, 1994. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  25. G. Nathan and D. Holt, “The effect of survey design on regression analysis,” Journal of the Royal Statistical Society. Series B, vol. 42, no. 3, pp. 377–386, 1980. View at: Google Scholar | Zentralblatt MATH | MathSciNet
  26. A. M. Krieger and D. Pfeffermann, “Maximum likelihood estimation from complex sample surveys,” Survey Methodology, vol. 18, pp. 225–239, 1992. View at: Google Scholar
  27. D. Pfeffermann and M. Sverchkov, “Parametric and semi-parametric estimation of regression models fitted to survey data,” Sankhya B, vol. 61, no. 1, pp. 166–186, 1999. View at: Google Scholar | Zentralblatt MATH | MathSciNet
  28. J. M. Robins, A. Rotnitzky, and L. P. Zhao, “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American Statistical Association, vol. 89, no. 427, pp. 846–866, 1994. View at: Publisher Site | Google Scholar | Zentralblatt MATH | MathSciNet
  29. J. M. Robins, F. S. Hsieh, and W. Newey, “Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates,” Journal of the Royal Statistical Society. Series B, vol. 57, no. 2, pp. 409–424, 1995. View at: Google Scholar | Zentralblatt MATH | MathSciNet

Copyright © 2007 Alastair Scott and Chris Wild. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

129 Views | 387 Downloads | 0 Citations
 PDF Download Citation Citation
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.